{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to recommender systems\n",
    "[Early, early draft]\n",
    "\n",
    "This chapter introduces recommender systems (commonly called RecSys),\n",
    "tools that recommmend *items* to *users*.\n",
    "Many of the most popular uses of recommender systems \n",
    "involve to suggesting products to customers.\n",
    "Amazon, for example, uses recommender systems to choose which retail products to display.\n",
    "Recommender systems aren't limited to physical products. \n",
    "For example, the algorithms that Pandora and Spotify use to curate playlists\n",
    "are recommender systems.\n",
    "Personalized suggestions on news websites are recommender systems.\n",
    "And as of this writing, several carousels on the home page for \n",
    "Amazon's Prime Videos's contain personalized TV and Movie recommendations.\n",
    "\n",
    "![](../img/recommended-prime-tv.png)\n",
    "\n",
    "I (Zack) have honestly no idea why Amazon wants me to watch Bubble Guppies. \n",
    "It's possible that Bubble Guppies is a masterpiece,\n",
    "and the recommender systems knows that my life will change upon watching it.\n",
    "It's also possible that the recommender made a mistake.\n",
    "For example, it might have extrapolated incorrectly from my affinity for the anime Death Note,\n",
    "thinking that I would similarly love any animated series.\n",
    "And, since I've never rated a nickelodean series (either postiively or negatively),\n",
    "the system may have no knowledge to the contrary.\n",
    "It's also possible that this series is a new addition to the catalogue,\n",
    "and thus they need to recommend the item to many users in ordder to develop a sense of *who* likes Bubble Guppies.\n",
    "This problem, of sorting out how to handle a new item, is called the *cold-start* problem.\n",
    "\n",
    "\n",
    "A recommender system doesn't have to use any sophisticated machine learning techniques.\n",
    "And it doesn't even have to be personalized.\n",
    "One reasonable baseline for most applications \n",
    "is to suggest the most popular items to everyone. \n",
    "But we have to be careful.\n",
    "Depending on how we define popularity,\n",
    "we might create a feedback loop.\n",
    "The most popular items get recommended which makes them even more popular,\n",
    "which makes them even more frequently recommended, etc.\n",
    "\n",
    "For services with diverse users,\n",
    "however, personalization can be essential.\n",
    "Diapers are among the most popular items on Amazon,\n",
    "but we probably shouldn't recommend diapers \n",
    "to adolescents. \n",
    "We also probably *should not* recommend anything associated with Justin Bieber\n",
    "to a user who *isn't* an adolescent. \n",
    "Moreover, we might want to personalize, not only to the user, but to the context.\n",
    "For example, just after I bought a Pixel phone,\n",
    "I was in the market for a phone case.\n",
    "But I have no interested in buying a phone case one year later.\n",
    "\n",
    "\n",
    "## Many ways to pose the problem \n",
    "\n",
    "While it might seem obvious,\n",
    "that personalization is a good strategy,\n",
    "it's not immediately obvious how best to articualate \n",
    "recommendation as a machine learning problem. \n",
    "\n",
    "Discuss:\n",
    "* Rating prediction\n",
    "* Passive feedback (view/notview)\n",
    "* Content-based recommendation\n",
    "\n",
    "## Amazon review dataset\n",
    "\n",
    "* introduce dataset\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import mxnet\n",
    "import mxnet.ndarray as nd\n",
    "import urllib\n",
    "import gzip"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "with gzip.open(urllib.request.urlopen(\"http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Grocery_and_Gourmet_Food_5.json.gz\")) as f:\n",
    "    data = [eval(l) for l in f]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'asin': '616719923X',\n",
       " 'helpful': [0, 0],\n",
       " 'overall': 4.0,\n",
       " 'reviewText': 'Just another flavor of Kit Kat but the taste is unique and a bit different.  The only thing that is bothersome is the price.  I thought it was a bit expensive....',\n",
       " 'reviewTime': '06 1, 2013',\n",
       " 'reviewerID': 'A1VEELTKS8NLZB',\n",
       " 'reviewerName': 'Amazon Customer',\n",
       " 'summary': 'Good Taste',\n",
       " 'unixReviewTime': 1370044800}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [Do some dataset exploration]\n",
    "* Look at the average rating\n",
    "* Look at the number of unique users and items\n",
    "* Plot a histogram of the number of ratings/reviews corresponding to each user\n",
    "* \"\" for items"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "users = [d['reviewerID'] for d in data]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "items = [d['asin'] for d in data]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "ratings = [d['overall'] for d in data]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Models \n",
    "* Just the average\n",
    "* Offset plus user and item biases\n",
    "* Latent factor model / matrix factorization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
